Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
2.
Nature ; 627(8004): 671-679, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38448585

RESUMO

DNA and histone modifications combine into characteristic patterns that demarcate functional regions of the genome1,2. While many 'readers' of individual modifications have been described3-5, how chromatin states comprising composite modification signatures, histone variants and internucleosomal linker DNA are interpreted is a major open question. Here we use a multidimensional proteomics strategy to systematically examine the interaction of around 2,000 nuclear proteins with over 80 modified dinucleosomes representing promoter, enhancer and heterochromatin states. By deconvoluting complex nucleosome-binding profiles into networks of co-regulated proteins and distinct nucleosomal features driving protein recruitment or exclusion, we show comprehensively how chromatin states are decoded by chromatin readers. We find highly distinctive binding responses to different features, many factors that recognize multiple features, and that nucleosomal modifications and linker DNA operate largely independently in regulating protein binding to chromatin. Our online resource, the Modification Atlas of Regulation by Chromatin States (MARCS), provides in-depth analysis tools to engage with our results and advance the discovery of fundamental principles of genome regulation by chromatin states.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina , Proteínas Nucleares , Nucleossomos , Proteômica , Humanos , Sítios de Ligação , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , DNA/genética , DNA/metabolismo , Elementos Facilitadores Genéticos , Heterocromatina/genética , Heterocromatina/metabolismo , Histonas/metabolismo , Proteínas Nucleares/análise , Proteínas Nucleares/metabolismo , Nucleossomos/química , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Proteômica/métodos
3.
Hypertension ; 81(5): 1156-1166, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38445514

RESUMO

BACKGROUND: Hypertension, a complex condition, is primarily defined based on blood pressure readings without involving its pathophysiological mechanisms. We aimed to identify biomarkers through a proteomic approach, thereby enhancing the future definition of hypertension with insights into its molecular mechanisms. METHODS: The discovery analysis included 1560 participants, aged 55 to 74 years at baseline, from the KORA (Cooperative Health Research in the Region of Augsburg) S4/F4/FF4 cohort study, with 3332 observations over a median of 13.4 years of follow-up. Generalized estimating equations were used to estimate the associations of 233 plasma proteins with hypertension and systolic blood pressure (SBP). For validation, proteins significantly associated with hypertension or SBP in the discovery analysis were validated in the KORA Age1/Age2 cohort study (1024 participants, 1810 observations). A 2-sample Mendelian randomization analysis was conducted to infer causalities of validated proteins with SBP. RESULTS: Discovery analysis identified 49 proteins associated with hypertension and 99 associated with SBP. Validation in the KORA Age1/Age2 study replicated 7 proteins associated with hypertension and 23 associated with SBP. Three proteins, NT-proBNP (N-terminal pro-B-type natriuretic peptide), KIM1 (kidney injury molecule 1), and OPG (osteoprotegerin), consistently showed positive associations with both outcomes. Five proteins demonstrated potential causal associations with SBP in Mendelian randomization analysis, including NT-proBNP and OPG. CONCLUSIONS: We identified and validated 7 hypertension-associated and 23 SBP-associated proteins across 2 cohort studies. KIM1, NT-proBNP, and OPG demonstrated robust associations, and OPG was identified for the first time as associated with blood pressure. For NT-proBNP (protective) and OPG, causal associations with SBP were suggested.


Assuntos
Hipertensão , Proteômica , Humanos , Pressão Sanguínea/fisiologia , Estudos de Coortes , Biomarcadores , Peptídeo Natriurético Encefálico , Fragmentos de Peptídeos
4.
Proteins ; 92(3): 343-355, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37874196

RESUMO

The design of protein interaction inhibitors is a promising approach to address aberrant protein interactions that cause disease. One strategy in designing inhibitors is to use peptidomimetic scaffolds that mimic the natural interaction interface. A central challenge in using peptidomimetics as protein interaction inhibitors, however, is determining how best the molecular scaffold aligns to the residues of the interface it is attempting to mimic. Here we present the Scaffold Matcher algorithm that aligns a given molecular scaffold onto hotspot residues from a protein interaction interface. To optimize the degrees of freedom of the molecular scaffold we implement the covariance matrix adaptation evolution strategy (CMA-ES), a state-of-the-art derivative-free optimization algorithm in Rosetta. To evaluate the performance of the CMA-ES, we used 26 peptides from the FlexPepDock Benchmark and compared with three other algorithms in Rosetta, specifically, Rosetta's default minimizer, a Monte Carlo protocol of small backbone perturbations, and a Genetic algorithm. We test the algorithms' performance on their ability to align a molecular scaffold to a series of hotspot residues (i.e., constraints) along native peptides. Of the 4 methods, CMA-ES was able to find the lowest energy conformation for all 26 benchmark peptides. Additionally, as a proof of concept, we apply the Scaffold Match algorithm with CMA-ES to align a peptidomimetic oligooxopiperazine scaffold to the hotspot residues of the substrate of the main protease of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Our implementation of CMA-ES into Rosetta allows for an alternative optimization method to be used on macromolecular modeling problems with rough energy landscapes. Finally, our Scaffold Matcher algorithm allows for the identification of initial conformations of interaction inhibitors that can be further designed and optimized as high-affinity reagents.


Assuntos
Peptidomiméticos , Algoritmos , Peptídeos/química , Conformação Molecular , Benchmarking
5.
iScience ; 26(9): 107578, 2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37664629

RESUMO

Microbial communities reside at the interface between humans and their environment. Whether the microbiome can be leveraged to gain information on human interaction with museum objects is unclear. To investigate this, we selected objects from the Museum für Naturkunde and the Pergamonmuseum in Berlin, Germany, varying in material and size. Using swabs, we collected 126 samples from natural and cultural heritage objects, which were analyzed through 16S rRNA sequencing. By comparing the microbial composition of touched and untouched objects, we identified a microbial signature associated with human skin microbes. Applying this signature to cultural heritage objects, we identified areas with varying degrees of exposure to human contact on the Ishtar gate and Sam'al gate lions. Furthermore, we differentiated objects touched by two different individuals. Our findings demonstrate that the microbiome of museum objects provides insights into the level of human contact, crucial for conservation, heritage science, and potentially provenance research.

6.
Sci Rep ; 13(1): 12787, 2023 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-37550328

RESUMO

We present an artificial neural network architecture, termed STENCIL-NET, for equation-free forecasting of spatiotemporal dynamics from data. STENCIL-NET works by learning a discrete propagator that is able to reproduce the spatiotemporal dynamics of the training data. This data-driven propagator can then be used to forecast or extrapolate dynamics without needing to know a governing equation. STENCIL-NET does not learn a governing equation, nor an approximation to the data themselves. It instead learns a discrete propagator that reproduces the data. It therefore generalizes well to different dynamics and different grid resolutions. By analogy with classic numerical methods, we show that the discrete forecasting operators learned by STENCIL-NET are numerically stable and accurate for data represented on regular Cartesian grids. A once-trained STENCIL-NET model can be used for equation-free forecasting on larger spatial domains and for longer times than it was trained for, as an autonomous predictor of chaotic dynamics, as a coarse-graining method, and as a data-adaptive de-noising method, as we illustrate in numerical experiments. In all tests, STENCIL-NET generalizes better and is computationally more efficient, both in training and inference, than neural network architectures based on local (CNN) or global (FNO) nonlinear convolutions.

7.
BMC Med ; 21(1): 245, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37407978

RESUMO

BACKGROUND: Due to the asymptomatic nature of the early stages, chronic kidney disease (CKD) is usually diagnosed at late stages and lacks targeted therapy, highlighting the need for new biomarkers to better understand its pathophysiology and to be used for early diagnosis and therapeutic targets. Given the close relationship between CKD and cardiovascular disease (CVD), we investigated the associations of 233 CVD- and inflammation-related plasma proteins with kidney function decline and aimed to assess whether the observed associations are causal. METHODS: We included 1140 participants, aged 55-74 years at baseline, from the Cooperative Health Research in the Region of Augsburg (KORA) cohort study, with a median follow-up time of 13.4 years and 2 follow-up visits. We measured 233 plasma proteins using a proximity extension assay at baseline. In the discovery analysis, linear regression models were used to estimate the associations of 233 proteins with the annual rate of change in creatinine-based estimated glomerular filtration rate (eGFRcr). We further investigated the association of eGFRcr-associated proteins with the annual rate of change in cystatin C-based eGFR (eGFRcys) and eGFRcr-based incident CKD. Two-sample Mendelian randomization was used to infer causality. RESULTS: In the fully adjusted model, 66 out of 233 proteins were inversely associated with the annual rate of change in eGFRcr, indicating that higher baseline protein levels were associated with faster eGFRcr decline. Among these 66 proteins, 21 proteins were associated with both the annual rate of change in eGFRcys and incident CKD. Mendelian randomization analyses on these 21 proteins suggest a potential causal association of higher tumor necrosis factor receptor superfamily member 11A (TNFRSF11A) level with eGFR decline. CONCLUSIONS: We reported 21 proteins associated with kidney function decline and incident CKD and provided preliminary evidence suggesting a potential causal association between TNFRSF11A and kidney function decline. Further Mendelian randomization studies are needed to establish a conclusive causal association.


Assuntos
Doenças Cardiovasculares , Insuficiência Renal Crônica , Pessoa de Meia-Idade , Masculino , Humanos , Feminino , Idoso , Estudos de Coortes , Proteômica , Insuficiência Renal Crônica/genética , Taxa de Filtração Glomerular , Rim , Creatinina
8.
Mamm Genome ; 34(2): 200-215, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37221250

RESUMO

Echocardiography, a rapid and cost-effective imaging technique, assesses cardiac function and structure. Despite its popularity in cardiovascular medicine and clinical research, image-derived phenotypic measurements are manually performed, requiring expert knowledge and training. Notwithstanding great progress in deep-learning applications in small animal echocardiography, the focus has so far only been on images of anesthetized rodents. We present here a new algorithm specifically designed for echocardiograms acquired in conscious mice called Echo2Pheno, an automatic statistical learning workflow for analyzing and interpreting high-throughput non-anesthetized transthoracic murine echocardiographic images in the presence of genetic knockouts. Echo2Pheno comprises a neural network module for echocardiographic image analysis and phenotypic measurements, including a statistical hypothesis-testing framework for assessing phenotypic differences between populations. Using 2159 images of 16 different knockout mouse strains of the German Mouse Clinic, Echo2Pheno accurately confirms known cardiovascular genotype-phenotype relationships (e.g., Dystrophin) and discovers novel genes (e.g., CCR4-NOT transcription complex subunit 6-like, Cnot6l, and synaptotagmin-like protein 4, Sytl4), which cause altered cardiovascular phenotypes, as verified by H&E-stained histological images. Echo2Pheno provides an important step toward automatic end-to-end learning for linking echocardiographic readouts to cardiovascular phenotypes of interest in conscious mice.


Assuntos
Aprendizado Profundo , Camundongos , Animais , Ecocardiografia/métodos , Coração , Algoritmos , Fenótipo , Ribonucleases
9.
PLoS Comput Biol ; 19(1): e1010820, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36608142

RESUMO

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the "best" ones. However, if only the best results are selectively reported, this may cause over-optimism: the "best" method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the "best" method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.


Assuntos
Microbiota , Aprendizado de Máquina , Consórcios Microbianos , Bactérias , Análise por Conglomerados
10.
BMC Neurosci ; 23(1): 81, 2022 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-36575380

RESUMO

Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, the German Mouse Clinic and similar facilities worldwide have produced large, uniform data sets of averaged ABR raw data of mutant and wildtype mice. In the course of standard ABR analysis, hearing thresholds are assessed visually by trained staff from series of signal curves of increasing sound pressure level. This is time-consuming and prone to be biased by the reader as well as the graphical display quality and scale.In an attempt to reduce workload and improve quality and reproducibility, we developed and compared two methods for automated hearing threshold identification from averaged ABR raw data: a supervised approach involving two combined neural networks trained on human-generated labels and a self-supervised approach, which exploits the signal power spectrum and combines random forest sound level estimation with a piece-wise curve fitting algorithm for threshold finding.We show that both models work well and are suitable for fast, reliable, and unbiased hearing threshold detection and quality control. In a high-throughput mouse phenotyping environment, both methods perform well as part of an automated end-to-end screening pipeline to detect candidate genes for hearing involvement. Code for both models as well as data used for this work are freely available.


Assuntos
Surdez , Potenciais Evocados Auditivos do Tronco Encefálico , Humanos , Animais , Camundongos , Potenciais Evocados Auditivos do Tronco Encefálico/fisiologia , Reprodutibilidade dos Testes , Limiar Auditivo/fisiologia , Audição/fisiologia , Estimulação Acústica/métodos
11.
Proc Math Phys Eng Sci ; 478(2262): 20210875, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35756877

RESUMO

Remote sensing observations from satellites and global biogeochemical models have combined to revolutionize the study of ocean biogeochemical cycling, but comparing the two data streams to each other and across time remains challenging due to the strong spatial-temporal structuring of the ocean. Here, we show that the Wasserstein distance provides a powerful metric for harnessing these structured datasets for better marine ecosystem and climate predictions. The Wasserstein distance complements commonly used point-wise difference methods such as the root-mean-squared error, by quantifying differences in terms of spatial displacement in addition to magnitude. As a test case, we consider chlorophyll (a key indicator of phytoplankton biomass) in the northeast Pacific Ocean, obtained from model simulations, in situ measurements, and satellite observations. We focus on two main applications: (i) comparing model predictions with satellite observations, and (ii) temporal evolution of chlorophyll both seasonally and over longer time frames. The Wasserstein distance successfully isolates temporal and depth variability and quantifies shifts in biogeochemical province boundaries. It also exposes relevant temporal trends in satellite chlorophyll consistent with climate change predictions. Our study shows that optimal transport vectors underlying the Wasserstein distance provide a novel visualization tool for testing models and better understanding temporal dynamics in the ocean.

12.
Proc Math Phys Eng Sci ; 478(2262): 20210916, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35756878

RESUMO

We present a statistical learning framework for robust identification of differential equations from noisy spatio-temporal data. We address two issues that have so far limited the application of such methods, namely their robustness against noise and the need for manual parameter tuning, by proposing stability-based model selection to determine the level of regularization required for reproducible inference. This avoids manual parameter tuning and improves robustness against noise in the data. Our stability selection approach, termed PDE-STRIDE, can be combined with any sparsity-promoting regression method and provides an interpretable criterion for model component importance. We show that the particular combination of stability selection with the iterative hard-thresholding algorithm from compressed sensing provides a fast and robust framework for equation inference that outperforms previous approaches with respect to accuracy, amount of data required, and robustness. We illustrate the performance of PDE-STRIDE on a range of simulated benchmark problems, and we demonstrate the applicability of PDE-STRIDE on real-world data by considering purely data-driven inference of the protein interaction network for embryonic polarization in Caenorhabditis elegans. Using fluorescence microscopy images of C. elegans zygotes as input data, PDE-STRIDE is able to learn the molecular interactions of the proteins.

13.
PLoS Comput Biol ; 18(5): e1010044, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35533202

RESUMO

Statistical analysis of microbial genomic data within epidemiological cohort studies holds the promise to assess the influence of environmental exposures on both the host and the host-associated microbiome. However, the observational character of prospective cohort data and the intricate characteristics of microbiome data make it challenging to discover causal associations between environment and microbiome. Here, we introduce a causal inference framework based on the Rubin Causal Model that can help scientists to investigate such environment-host microbiome relationships, to capitalize on existing, possibly powerful, test statistics, and test plausible sharp null hypotheses. Using data from the German KORA cohort study, we illustrate our framework by designing two hypothetical randomized experiments with interventions of (i) air pollution reduction and (ii) smoking prevention. We study the effects of these interventions on the human gut microbiome by testing shifts in microbial diversity, changes in individual microbial abundances, and microbial network wiring between groups of matched subjects via randomization-based inference. In the smoking prevention scenario, we identify a small interconnected group of taxa worth further scrutiny, including Christensenellaceae and Ruminococcaceae genera, that have been previously associated with blood metabolite changes. These findings demonstrate that our framework may uncover potentially causal links between environmental exposure and the gut microbiome from observational data. We anticipate the present statistical framework to be a good starting point for further discoveries on the role of the gut microbiome in environmental health.


Assuntos
Microbioma Gastrointestinal , Estudos de Coortes , Exposição Ambiental/efeitos adversos , Microbioma Gastrointestinal/genética , Humanos , Estudos Prospectivos , Distribuição Aleatória
14.
Stat Med ; 41(15): 2786-2803, 2022 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-35466418

RESUMO

The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB-RRR) and negative binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially, effectively delivering interpretable bi-clusters of taxa and host-associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.


Assuntos
Análise de Dados , Microbiota , Análise Fatorial , Comportamento Alimentar , Microbioma Gastrointestinal , Humanos , Estilo de Vida , Análise de Regressão , Estados Unidos
15.
Sci Adv ; 8(3): eabl4930, 2022 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-35061539

RESUMO

Extensive microdiversity within Prochlorococcus, the most abundant marine cyanobacterium, occurs at scales from a single droplet of seawater to ocean basins. To interpret the structuring role of variations in genetic potential, as well as metabolic and physiological acclimation, we developed a mechanistic constraint-based modeling framework that incorporates the full suite of genes, proteins, metabolic reactions, pigments, and biochemical compositions of 69 sequenced isolates spanning the Prochlorococcus pangenome. Optimizing each strain to the local, observed physical and chemical environment along an Atlantic Ocean transect, we predicted variations in strain-specific patterns of growth rate, metabolic configuration, and physiological state, defining subtle niche subspaces directly attributable to differences in their encoded metabolic potential. Predicted growth rates covaried with observed ecotype abundances, affirming their significance as a measure of fitness and inferring a nonlinear density dependence of mortality. Our study demonstrates the potential to interpret global-scale ecosystem organization in terms of cellular-scale processes.

16.
Front Genet ; 12: 766405, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34950190

RESUMO

Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA's excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.

17.
Sci Rep ; 11(1): 14505, 2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-34267244

RESUMO

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.


Assuntos
Infecções por HIV/microbiologia , Microbiota , Modelos Teóricos , Microbiologia do Solo , Microbiologia da Água , Archaea/genética , Bactérias/genética , Bases de Dados Factuais , Fezes/microbiologia , Microbioma Gastrointestinal , Infecções por HIV/imunologia , Humanos , Concentração de Íons de Hidrogênio , Receptores de Lipopolissacarídeos/imunologia , Microbiota/genética , Microbiota/fisiologia , RNA Ribossômico 16S , Salinidade , Solo/química
18.
Phys Rev E ; 103(4-1): 042310, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34005966

RESUMO

We propose a statistical learning framework based on group-sparse regression that can be used to (i) enforce conservation laws, (ii) ensure model equivalence, and (iii) guarantee symmetries when learning or inferring differential-equation models from data. Directly learning interpretable mathematical models from data has emerged as a valuable modeling approach. However, in areas such as biology, high noise levels, sensor-induced correlations, and strong intersystem variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage prior knowledge from physical principles to learn biologically plausible and physically consistent models rather than models that simply fit the data best. We present the group iterative hard thresholding algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing priors in data-driven modeling.

19.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33264391

RESUMO

MOTIVATION: Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. RESULTS: Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich). AVAILABILITY: R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. CONTACT: Tel:+49 89 3187 43258; stefanie.peschel@mail.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota/genética , Software , Humanos
20.
J Comput Graph Stat ; 30(4): 1249-1256, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35280976

RESUMO

Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having prevented the routine use of these models on high-dimensional data. Here, we propose a new computational approach for estimating latent correlations via a hybrid multilinear interpolation and optimization scheme. Our approach speeds up the current state of the art computation by several orders of magnitude, thus allowing fast computation of latent Gaussian copula models even when the number of variables p is large. We provide theoretical guarantees for the approximation error of our numerical scheme and support its excellent performance on simulated and real-world data. We illustrate the practical advantages of our method on high-dimensional sparse quantitative and relative abundance microbiome data as well as multi-view data from The Cancer Genome Atlas Project. Our method is implemented in the R package mixedCCA, available at https://github.com/irinagain/mixedCCA.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...